A Toolbox for Record Linkage

نویسندگان

  • Rainer Schnell
  • Tobias Bachteler
  • Stefan Bender
چکیده

We developed a record-linkage toolbox in order to compare the performance of various string-similarity measures for German surnames. This ”Matching Tool-Box” (MTB) is made up by independent, highly portable JAVA-programs. MTB is currently used for prototyping pre-processing tools and the empirical comparison of string-similarity measures. Furthermore, MTB has been used successfully in sociological, economical and epidemiological research projects. Zusammenfassung: Um die Verwendbarkeit der verschiedener Ähnlichkeitsmaße für fehlerbehaftete Namen auch für deutsche Namen vergleichen zu können, entwickelten wir eine eine ”Matching Tool-Box” (MTB). MTB besteht aus mehreren, transportablen JAVA-Programmen. MTB dient zur Entwicklung von Pre-processing-Werkzeugen und dem Vergleich von String-Ähnlichkeitsmaßen. MTB wurde erfolgreich in sozialund wirtschaftswissenschaftlichen sowie epidemiologischen Forschungsprojekten eingesetzt.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TAILOR: A Record Linkage Tool Box

Data cleaning is a vital process that ensures the quality of data stored in real-world databases. Data cleaning problems are frequently encountered in many research areas, such as knowledge discovery in databases, data warehousing, system integration and e-services. The process of identifying the record pairs that represent the same entity (duplicate records), commonly known as record linkage, ...

متن کامل

Probabilistic Linkage of Persian Record with Missing Data

Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...

متن کامل

Record Linkage I: Evaluation of Commercially Available Record Linkage Software for Use in NASS

Record linkage is an important technique in NASS for minimizing the presence of duplicate names on its list sampling frame of farm operators and agribusinesses. In the late 1970' s, NASS developed an automated record linkage system which runs on an IBM mainframe for this purpose. With changes in technology, the need has arisen for portability between platforms, integration with client/server te...

متن کامل

A Novel Toolbox for Generating Realistic Biological Cell Geometries for Electromagnetic Microdosimetry

Researchers in bioelectromagnetics often require realistic tissue, cellular and sub-cellular geometry models for their simulations. However, biological shapes are often extremely irregular, while conventional geometrical modeling tools on the market cannot meet the demand for fast and efficient construction of irregular geometries. We have designed a free, user-friendly tool in MATLAB that comb...

متن کامل

A Decision Tree Based Record Linkage for Recommendation Systems

Record linkage merges all the records relating to the same entity from multiple datasets, at the entity level. It is the initial data preparation phase for most of the database projects. Traditionally one to one data linkage is performed among the entities of same type with common unique identifier. The proposed one to many and/or many to many record linkage method is able to link the entities ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004